-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Add DruidContainers to run docker tests with embedded-test framework #18302
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
embedded-tests/src/test/java/org/apache/druid/testing/embedded/docker/IngestionDockerTest.java
Fixed
Show fixed
Hide fixed
kgyrtkirk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left a few comments here and there...some of them might be considered personal preference/etc so not addressing all is ok as well :)
.github/workflows/ci.yml
Outdated
| uses: ./.github/workflows/worker.yml | ||
| with: | ||
| script: .github/scripts/run_unit-tests -Dtest=!QTest,'${{ matrix.pattern }}' -Dmaven.test.failure.ignore=true | ||
| script: .github/scripts/run_unit-tests -Dtest=!QTest,!*DockerTest*,'${{ matrix.pattern }}' -Dmaven.test.failure.ignore=true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm quilty of not moving the QTest into the validation phase ; which seem to have given rise to this pattern which could be considered bad practice
this is not really serious...it could be fixed separaetly - but nicely outlines your requirement that you just wanted to add 1 more test....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. Please check the latest comments.
.github/workflows/docker-tests.yml
Outdated
| source .github/scripts/distribution_checks_env.sh | ||
| .github/scripts/run_unit-tests -Dtest=*DockerTest* -Ddruid.testing.docker.image=$DRUID_DIST_IMAGE_NAME |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of trying to reuse the run_unit-tests script; you could possibly make a separate one for this; and run that (you could still invode run_unit-tests from there if you want)
you should preferrably also use the worker.yml to run it....that way if you want to test the workflow you could just try to run that script on your own machine!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of trying to reuse the run_unit-tests script; you could possibly make a separate one for this; and run that (you could still invode run_unit-tests from there if you want)
Could you please elaborate? What would be the advantage of putting this in a separate script if we would still invoke run_unit-tests from there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could run it locally to reproduce what's happening on the CI; you could call mvn directly; but you will also need to build the image...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added script run_docker-tests. To run all Docker tests locally,
$ DRUID_DIST_IMAGE_NAME=apache/druid:34.0.0-rc1 .github/scripts/run_docker-tests
This doesn't include the image building yet though.
...ded-tests/src/test/java/org/apache/druid/testing/embedded/docker/DruidContainerResource.java
Outdated
Show resolved
Hide resolved
...ded-tests/src/test/java/org/apache/druid/testing/embedded/docker/DruidContainerResource.java
Outdated
Show resolved
Hide resolved
| private static void createLogDirectory(File dir) | ||
| { | ||
| try { | ||
| FileUtils.mkdirp(dir); | ||
| Files.setPosixFilePermissions(dir.toPath(), PosixFilePermissions.fromString("rwxrwxrwx")); | ||
| } | ||
| catch (Exception e) { | ||
| throw new RuntimeException(e); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
couldn't the MountedDir provide a place for this method? that way if someone needs similar in the future it might got reused...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated ,thanks for the suggestion!
embedded-tests/src/test/java/org/apache/druid/testing/embedded/docker/DruidContainers.java
Outdated
Show resolved
Hide resolved
| return new DruidContainerResource(DruidCommand.INDEXER) | ||
| .addProperty("druid.lookup.enableLookupSyncOnStartup", "false") | ||
| .addProperty("druid.processing.buffer.sizeBytes", "50MiB") | ||
| .addProperty("druid.processing.numMergeBuffers", "2") | ||
| .addProperty("druid.processing.numThreads", "5"); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
feels like the DruidCommand was over and underused at the same time...
- it contains some stuff so
new DruidContainerResource(DruidCommand.INDEXER)could create a valid object - its underused as a good one will have these 4 properties set...
feels like there should be a DruidOverlordContainer / DruidIndexerContainer and similar....it could still be an innerclass here...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, I have moved all required properties to DruidCommand so that all users of DruidContainer can get those default properties out of the box.
|
Thanks for the suggestions, @kgyrtkirk ! I have made the following changes:
I am not very clear on how the usage of |
| public class IngestionDockerTest extends EmbeddedClusterTestBase | ||
| { | ||
| static { | ||
| System.setProperty("druid.testing.docker.image", "apache/druid:35.0.0-SNAPSHOT"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting this in a static initializer doesn't seem right, because it will be set when the class is loaded, not necessarily when the test runs. Maybe better to use @BeforeAll.
Although, it'd be even better to not have to set the property at all. Like, perhaps the property just sets a default, but tests are able to specify their own when constructing the DruidContainer. That's how it works for other kinds of testcontainers.
Also- about the value- do we need to change the value when Druid's version is upgraded? Is it possible to get it programmatically, such as from DruidContainerResource.class.getPackage().getImplementationVersion()? Should work when running from a jar, possibly not from an IDE.
Finally, if we are going to keep needing to set this property, better to refer to the name by DruidContainerResource.PROPERTY_TEST_IMAGE rather than hardcoding it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting this in a static initializer doesn't seem right, because it will be set when the class is loaded, not necessarily when the test runs. Maybe better to use @BeforeAll.
Sorry, this was not meant to be committed and was for my local testing only.
Must have crept in in a recent commit.
In CI workflows, this property is always passed as a Java argument -Ddruid.testing.docker.image.
So, it needs to be set only when running the test locally.
I chose purposefully not to use a default image so that each test has to set it explicitly.
KafkaContainer seems to have deprecated the no-arg constructor too, so that test writers explicitly call out the required image name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For local testing, what do you think would be the best way to specify this property?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine to require people to set the system property for local testing. Some thought would be needed on part of the person running the tests about what image they want to use. If they want to test against their current branch, they'll need to build a Docker image. Maybe they don't care about that; maybe they just want to test the Docker embedded test itself, in which case it would be fine to run against a released Apache image.
The error message when the property isn't set should spell this out. It should mention that the property is typically set during CI, but isn't set outside CI because we don't know what image you want to use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some thought also needs to be given to how committers should run mvn test when they test release builds as part of release voting. Do they need to build a Docker image first in order to run this test? Do they need to specify -Ddruid.testing.docker.image to mvn? Or is it automatic as part of the mvn workflow? I am curious what you think about these items.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for bringing this up, @gianm !
For release voting, I feel it would be best for committers to download the image built as part of the RC and then running mvn test -Ddruid.testing.docker.image=apache/druid:rc1 -Dtest=*DockerTest*.
Since the release voting process is not automated anyway (perhaps by design so that voters are aware of the steps they go through), it is probably okay to require specifying -Ddruid.testing.docker.image explicitly.
We can probably add these instructions to release guidelines (and the vote mail) so that users can always copy paste the commands.
What are your thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message when the property isn't set should spell this out. It should mention that the property is typically set during CI, but isn't set outside CI because we don't know what image you want to use.
Updated the message in DruidContainerResource.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if the test should be disabled by default, and enabled using a mvn profile. The case for disabling it by default is that it isn't actually testing the current code in the repository. It's testing a image pulled from somewhere else. Having it be its own command makes it more clear what is happening. During voting if committers want to test the docker image too, that can be done with a special mvn command separate from the typical test command.
How does this sound to you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion!
I agree that this would be the cleanest approach.
embedded-tests/src/test/java/org/apache/druid/testing/embedded/docker/IngestionDockerTest.java
Outdated
Show resolved
Hide resolved
embedded-tests/src/test/java/org/apache/druid/testing/embedded/docker/IngestionDockerTest.java
Outdated
Show resolved
Hide resolved
...ded-tests/src/test/java/org/apache/druid/testing/embedded/docker/DruidContainerResource.java
Outdated
Show resolved
Hide resolved
...ded-tests/src/test/java/org/apache/druid/testing/embedded/docker/DruidContainerResource.java
Outdated
Show resolved
Hide resolved
embedded-tests/src/test/java/org/apache/druid/testing/embedded/docker/IngestionDockerTest.java
Outdated
Show resolved
Hide resolved
embedded-tests/src/test/java/org/apache/druid/testing/embedded/docker/IngestionDockerTest.java
Outdated
Show resolved
Hide resolved
services/src/test/java/org/apache/druid/testing/embedded/EmbeddedDruidCluster.java
Outdated
Show resolved
Hide resolved
...ded-tests/src/test/java/org/apache/druid/testing/embedded/docker/DruidContainerResource.java
Fixed
Show fixed
Hide fixed
|
@gianm , I also needed to include some other changes to provide visibility into the containers and are very useful in writing these Docker-based tests.
I hope these extra changes have not made the PR too bloated. |
capistrant
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good! few nit comments here and there. My biggest question is the one on container friendly embedded hostnames and why it is not just the default way services are setting up their hostname
server/src/test/java/org/apache/druid/server/metrics/LatchableEmitter.java
Outdated
Show resolved
Hide resolved
...d-tests/src/test/java/org/apache/druid/testing/embedded/psql/PostgreSQLMetadataResource.java
Outdated
Show resolved
Hide resolved
...-tests/src/test/java/org/apache/druid/testing/embedded/psql/PostgreSQLMetadataStoreTest.java
Outdated
Show resolved
Hide resolved
...d-tests/src/test/java/org/apache/druid/testing/embedded/docker/CustomNodeRoleDockerTest.java
Outdated
Show resolved
Hide resolved
| @Override | ||
| public EmbeddedDruidCluster createCluster() | ||
| { | ||
| coordinator.usingImage(DruidContainer.Image.APACHE_31); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this something someone is going to have to periodically update and commit to master?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I think we can keep testing against 31 until we decide to make some major change which is not backward compatible with 31 anymore.
I didn't try using an image older than 31, but we could probably go as far back as 27 and still be backward compatible (at least for the things that are being verified in these tests).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this test will get skipped by default since it has DockerTest in the name. IMO, it's sensible to always run this particular test. The main reason I thought that Docker tests in general shouldn't all run is that they're mostly intended to run against the "current code". They don't serve their purpose unless the person running the tests actually creates an image matching the current checked out repo. This needs to be done manually so it makes sense that it's a separate command.
But, the backwards compatibility test isn't intended to deploy the "current code". It wants to deploy a specific older version. So, it doesn't have that problem, and it's therefore good to always run it.
Btw, IMO it's nicer to use annotations rather than test names to control what tests run in a standard mvn run. The downside of using names is that it's indirect. It makes it less obvious which tests run and which don't, and also means that if you want to change whether a test runs, you have to rename it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, IMO it's nicer to use annotations rather than test names to control what tests run in a standard mvn run. The downside of using names is that it's indirect. It makes it less obvious which tests run and which don't, and also means that if you want to change whether a test runs, you have to rename it.
I agree. In the current code, I am using the @Tag("docker-tests") annotation (applied on DockerTestBase) to skip the Docker-based tests.
The *DockerTest* class name is to tell mvn verify to look for these classes while searching for eligible integration tests. Otherwise, it only looks for *IT or IT* classes by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But, the backwards compatibility test isn't intended to deploy the "current code". It wants to deploy a specific older version. So, it doesn't have that problem, and it's therefore good to always run it.
Thanks for the suggestion!
In that case, I will remove the BackwardCompatiblityIngestionDockerTest from the DockerTestBase hierarchy. I will also update it to use only Druid 31 containers and embedded servers for it (currently it uses a mix of Druid 31 containers, current code Druid containers and embedded servers).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. In the current code, I am using the
@Tag("docker-tests")annotation (applied onDockerTestBase) to skip the Docker-based tests.
The*DockerTest*class name is to tellmvn verifyto look for these classes while searching for eligible integration tests. Otherwise, it only looks for*ITorIT*classes by default.
Ah, I see. I missed that when reading the maven stuff.
| * Uses a container-friendly hostname for all embedded services, Druid as well | ||
| * as external. | ||
| */ | ||
| public EmbeddedDruidCluster useContainerFriendlyHostname() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the reason of having clusters opt into containerFriendlyHostname? Is there a downside or problem of using it if you don't have to, say for an existing embedded test you have written, HighAvailability?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point!
All the existing embedded tests would still work if we always used the containerFriendly hostname. (since the Docker tests already use embedded services and they are able to connect to each other seamlessly)
In fact, that's how it used to be (except we were using InetAddress.getLocalHost().getCanonicalHostName() instead of InetAddress.getLocalHost().getHostAddress()),
but we changed it in #18228 as there were some apprehensions with using the canonical host name,
ref https://github.com/apache/druid/pull/18228/files#r2197340379.
But now that we are using the getHostAddress() which is simply the IP address, I think we can just stick to using the containerFriendly one all the time.
cc: @clintropolis
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From an ease of use perspective I think it would be awesome to get to a point where we default to container friendly so people who are writing new tests don't need to have to decide/realize their test requires it. But I don't think it is blocking for this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current default (i.e. localhost) should already be good enough for all (non container) embedded tests.
For Druid container based tests, users writing new tests need not be conscious of the choice as long as they extend DockerTestBase.
Also, since Druid containers are not meant to be used frequently except in a couple of smoke tests, the extra step is probably okay?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ya, I agree. I'm probably just overthinking it since I'm currently working on the KafkaDataFormat tests which requires an extra docker container that can talk to the kafka container :)
|
Thanks for the suggestions, @capistrant ! I have updated the javadocs as suggested and replied to your other comments. |
capistrant
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm cool with current state of this. Will defer to you @kfaraz if you want to get final word from Gian and/or Zoltan on your changes after their review before you merge
gianm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving, since I have one comment remaining but I don't consider it blocking: #18302 (comment)
|
Thanks for the reviews, @kgyrtkirk , @capistrant , @gianm ! |
This patch is based on the changes originally proposed in #18265.
Since that PR had a very long history of tuning the workflows (sincere thanks to @Akshat-Jain for the assistance!),
I have created a fresh PR instead to simplify reviews.
The old PR had only a couple of review comments which have already been answered.
#18265 (comment)
#18265 (comment)
Summary
CliEventCollectorto monitor the containers and write efficient testsdocker-testsin GHA which runs a few Docker-based testsHow to run
From Druid root directory:
(This commands runs successfully for the ongoing release RC. Only 1 test for custom node role fails as it requires a minor change to the
druid.shscript that is used to start Druid services in the containers)Main changes to review
DruidContainerDruidCommandDruidContainerResourceDockerTestBaseCliEventCollectorImportant scripts
.github/workflows/docker-tests.yml.github/scripts/run_docker-testsNew test classes
IngestionDockerTestCustomNodeRoleDockerTestIngestionBackwardCompatibilityDockerTestHttpEmitterEventCollectorTestChange details
druid-testcontainersthat can be later published as an individual module forTestcontainersDruidContainer-Testcontainerimpl for running individual Druid servicesDruidContainerResourceto allow use ofDruidContainerin embedded cluster testsEmbeddedDruidServerandEmbeddedDruidClusterto supportrunning both embedded servers and container-based services in the same cluster
distribution-checks.ymljobCliCustomNodeRoletoCliEventCollector, which is a custom Druid node with commandeventCollector.ITHighAvailabilityTestand related filesNote: Usage of
DruidContainerwill not be the norm but the exception to test out backward compatibilityand/or Docker related changes only. All other use cases should continue to use
EmbeddedDruidServers.CliEventCollectorBoth the standard ITs and revised ITs used a
CliCustomNodeRoleinITHighAvailabilityTestThis patch removes all the code related to these.
ITHighAvailabilityTesthas already been migrated to the embedded version,HighAvailabilityTest.The
CliCustomNodeRolehas been renamed toCliEventCollectorand moved totesting-toolsThis class now serves a dual purpose:
CustomNodeRoleDockerTestHttpPostEmitter.LatchableEmitter.Cli*provides a lot of wiring out of the box without having to write new code.Sample test run
https://github.com/apache/druid/actions/runs/16416105923/job/46388247532?pr=18302
Advantages
which use a custom IT-only image
Next steps
druid-testcontainersevolves, we can try to contribute it to the mainTestcontainersrepoEmbeddedClusterTestBaseEmbeddedDruidServer(and required containers only, likeKafkaResource)DruidContainer, some of which will use MiddleManagersThis PR has: